feat(restore): adds --dc-mapping flag to restore command #4213

VAveryanov8 · 2025-01-16T15:54:41Z

This adds support for --dc-mapping flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. Only 1 use case is supported: 1-1 dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or extending (restore dc1 into dc1 and dc2) DCs is not supported when --dc-mapping is provided.
So the syntax is:

source_dc1=target_dc1,source_dc2=target_dc2
Where
     equal(=) is used to separate source   dc name and target dc name
     comma(,)  is used to separate multiple mappings

If --dc-mapping is not provided, then current behavior should be preserved - each node with access to DC can download it data. Also it's allowed to provide only subset of DCs, ignoring source dc or target (or both).
Only works with tables restoration (--restore-tables=true).

Fixes: #3829

Please make sure that:

Code is split to commits that address a single change
Commit messages are informative
Commit titles have module prefix
Commit titles have issue nr. suffix

Michal-Leszczynski · 2025-01-17T08:54:10Z

In terms of syntax:

"dc1,!dc2=>dc2" - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

I find it confusing that it's possible to specify both restored and skipped DCs in the same mapping key.
It reads like "restore dc1 and not dc2 into dc2" instead of "don't restore dc2 and restore dc1 into dc2".
IMHO it should look like:

 "!dc2;dc1=>dc2"     - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster.

I would suggest to validate that positive and negative DC occurrences are not mixed in the mapping key and that negative DCs can't be mapped to anything, so that we avoid confusion and typing mistakes.

EDIT: The same goes here:

"dc1,dc2=>dc3,!dc4" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.

The negative DC is placed in the mapping value alongside positive DC?
Now I see that we can't simply allow to write !dc1 because it's ambiguous to whether it's a source or destination DC.

But I'm still in favor of not mixing positive and negative DCs here.
I could look like that:

> "=>!dc4;dc1,dc2=>dc3" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster.

On the other hand, if we allow no DCs on either side of => then the ! is not needed anymore, but it's still nice for visibility.
What do you think about removing the ! in front of DC from the syntax and incorporating it into the =>? Something like:

"dc1!=>dc2,dc3;dc2=>dc1"- we want to skip dc1 in src, dc2 and dc3 in dst, and restore from dc2 in src into dc1 in dst

Alternative approach would be to add some special characters for distinguishing between src and dst DCs, but I guess that the relation to the => already takes care of that.

pkg/command/restore/res.yaml

pkg/command/restore/dcmappings_test.go

pkg/command/restore/cmd.go

pkg/service/restore/batch.go

pkg/service/restore/model.go

pkg/service/restore/worker.go

Makefile

Michal-Leszczynski · 2025-01-17T10:35:44Z

@VAveryanov8 I made a general review, but I didn't dive into other details, because it would be better to first discuss the current comments (other things might not matter after that).

Michal-Leszczynski · 2025-01-17T10:53:41Z

Also, we should deprecate the datacenter from the --location flag, as it was previously used to partially control the dc mapping. When dc mapping flag is present, it should be ignored. When it's not, it can still be respected, but in the implementation we would still just parse it into dc mapping on the SM side.

karol-kokoszka

@VAveryanov8 Thanks for the extensive PR and for covering a lot of different scenarios, but we don't need such a complex logic.

It's enough to do the one to one mapping. Instead of allowing to merge multiple DCs in source or multiple DCs in target.

Let's discuss the goal of DC mapping on the call.

It's enough to have a possibility of:

restoring just a single DC (that's why I don't really see the reason for introducing another flag that skips validation)
explicitly defining that nodes from target DCX are expected to download data of source DCY and call to load & stream

karol-kokoszka · 2025-01-31T16:33:36Z

pkg/service/restore/worker.go

+		return nil, errors.Wrap(err, "get status")
+	}
+
+	sourceMap, targetMap := target.DCMappings.calculateMappings()


nit: sourceMap => sourceDC2TargetDCMap , targetMap => targetDC2SourceDCMap

karol-kokoszka · 2025-01-31T17:02:26Z

pkg/command/restore/dcmappings.go

+
+type dcMapping struct {
+	Source []string `json:"source"`
+	Target []string `json:"target"`


The DC mapping should be single string to single string.
DC1 (of the source cluster) => DC2 (of the destination cluster) ; DC2 (of the source) => DC1 (of the target)

And that's it, no need for the complex logic.

karol-kokoszka · 2025-02-04T13:16:34Z

@VAveryanov8 I see some recent pushes to this PR, is it ready for re-review ?

VAveryanov8 · 2025-02-04T13:58:23Z

@VAveryanov8 I see some recent pushes to this PR, is it ready for re-review ?

Yes, it's ready for re-review

karol-kokoszka

Thanks !

@mikliapko There are SCT tests that were trying to restore multiDC cluster, but they were failing due to the problems with encryption at rest enabled on the cluster.
Two datacenters were using different encryption keys.

This PR is expected to fix this problem. AFAIR, you changed these tests to singleDC later. Is it possible to validate this PR against the SCT tests with the multiDC restore with encryption at rest enabled ?

karol-kokoszka · 2025-02-06T13:28:05Z

@VAveryanov8 Please rebase on master. You will have conflicts related to the new submodule, but they are suppose to be very easy to fix.

mikliapko · 2025-02-06T13:29:54Z

@mikliapko There are SCT tests that were trying to restore multiDC cluster, but they were failing due to the problems with encryption at rest enabled on the cluster. Two datacenters were using different encryption keys.

Yep, we can do it, just give me the SM build that can be used

karol-kokoszka · 2025-02-06T14:50:20Z

@mikliapko Here is the manager-build triggered on the current branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/919/

You will have to edit the test to include --dc-mapping source_dc1=>target_dc1;source_dc2=>target_dc2 into the sctool restore call.

This adds support for --dc-mapping flag to restore command. It specifies mapping between DCs from the backup and DCs in the restored(target) cluster. All DCs from the source cluster should be explicitly mapped to all DCs in the target cluster. The only exception is when source and target cluster has exact match: source dcs == target dcs. Only works with tables restoration (--restore-tables=true). Syntax: "source_dc1,source_dc2=>target_dc1,target_dc2" Multiple mappings are separated by semicolons (;). Exclamation mark (!) before a DC indicates that it should be ignored during restore. Examples: "dc1,dc2=>dc3" - data from dc1 and dc2 DCs should be restored to dc3 DC. "dc1,dc2=>dc3,!dc4" - data from dc1 and dc2 DCs should be restored to dc3 DC. Ignoring dc4 DC from target cluster. "dc1,!dc2=>dc2" - data from dc1 should be restored to dc2 DC. Ignoring dc2 from source cluster. Fixes: #3829

This introduces use of dc mappings when restoring tables. Now each dc is downloading only data from corresponding dc(s) accordingly to user provided mapping. Also some dcs can be explicitly ignored. Fixes: #3829

This adds another cluster to docker setup, so we can have integration tests for dc-mappings. Fixes: #3829

Fixes: #3829

This removes support for !dc syntax and introduces new --skip-dc-mapping-validation flag which can be used when partial restore is needed.

This removes third cluster, but adds another data center to the second cluster.

This introduces LocationInfo which is a direct replacement of locationHosts, but it contains more information about Location, like what DC are actually stored in this Location, what hosts can access it and the list of manifests from this location. Also LocationInfo is created with the respect of dc mappings.

This simplifies `--dc-mapping` usage only for 1 use case: 1-1 dc mapping. This means that squeezing (restore dc1 and dc2 into dc3) or extending (restore dc1 into dc1 and dc2) DCs is not supported when `--dc-mapping` is provided. So the syntax is also simplified: ``` source_dc1=>target_dc1;source_dc2=>target_dc2 Where => is used to separate source dc name and target dc name ; is used to separate multiple mappings ``` If `--dc-mapping` is not provided, then current behavior should be preserved - each node with access to DC can download it data. Also it's allowed to provide only subset of DCs, ignoring source dc or target (or both).

This removed dc3 from second cluster as `--dc-mappings` was simplified and there is no necessity in having cluster with another dc name.

As this pr does not introduce any breaking changes anymore, integration tests can be simplified as well.

This makes locationInfo.AllHosts to return only uniq hosts. Also some small fixes in tests and in error messages.

This fixes the behavior when dc-mapping is not set.

pkg/service/restore/model.go

pkg/command/restore/dcmappings.go

pkg/service/restore/batch_test.go

testing/nodes_exec

pkg/service/restore/restore_integration_test.go

pkg/service/restore/worker.go

pkg/service/restore/worker_test.go

mikliapko · 2025-02-10T12:29:15Z

@mikliapko Here is the manager-build triggered on the current branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/919/

You will have to edit the test to include --dc-mapping source_dc1=>target_dc1;source_dc2=>target_dc2 into the sctool restore call.

Testing with the provided build.

To make it work, dc mapping should be taken in quotes (see examples below):

ubuntu@manager-regression-adapt-fo-monitor-node-55ef36da-1:~$ sctool restore -c test --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20250210120404UTC --dc-mapping eu-west-2scylla_node_west=>eu-west-2scylla_node_west;us-eastscylla_node_east=>us-eastscylla_node_east
Error: invalid argument "eu-west-2scylla_node_west=" for "--dc-mapping" flag: invalid syntax, mapping should be in a format of sourceDcs=>targetDcs, but got: eu-west-2scylla_node_west=

-bash: us-eastscylla_node_east=: command not found
ubuntu@manager-regression-adapt-fo-monitor-node-55ef36da-1:~$ sctool restore -c test --restore-tables --location s3:manager-backup-tests-us-east-1 --snapshot-tag sm_20250210120404UTC --dc-mapping "eu-west-2scylla_node_west=>eu-west-2scylla_node_west;us-eastscylla_node_east=>us-eastscylla_node_east"
restore/e3cc3b69-0c48-4109-ad78-fbe6257f6eb9

@VAveryanov8 @karol-kokoszka Is it inevitable in this case?

VAveryanov8 · 2025-02-10T12:33:28Z

To make it work, dc mapping should be taken in quotes (see examples below):

For this particular syntax yes, quotes are required.

But the good news is that I'm gonna get rid of it towards more simple syntax using just equal(=) instead of array(=>) and comma(,) instead of semi-column(;). Then I suppose it will be possible to use without quotes.

mikliapko · 2025-02-10T13:01:03Z

To make it work, dc mapping should be taken in quotes (see examples below):

For this particular syntax yes, quotes are required.

But the good news is that I'm gonna get rid of it towards more simple syntax using just equal(=) instead of array(=>) and comma(,) instead of semi-column(;). Then I suppose it will be possible to use without quotes.

Great, please, let me know when you will have the build with new implementation

To run a restore task for multiDC cluster with EaR enabled (otherwise fails #1), the special flag has been introduced in Manager (#2) to map backed up DC with DC under restore, for example, sctool restore ... --dc-mapping dc1=dc1,dc2=dc2 The change introduces this new flag into `create_restore_task` method and makes sure that if Scylla cluster has more than 1 datacenter - the restore task will be triggered with this flag applied. refs: 1. scylladb/scylla-manager#3871 2. scylladb/scylla-manager#4213

This simplifies syntax of dc-mapping by leveraging pflag.StringToString functionality. Also extends integration tests with validation that each node should download only tables from corresponding DCs when dc-mapping is provided.

Michal-Leszczynski

The PR looks nice!
Just some question about the comment.

pkg/service/restore/worker.go

karol-kokoszka · 2025-02-11T13:20:55Z

@mikliapko Here is the latest build from this branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/923/
It contains all recent changes.

To run a restore task for multiDC cluster with EaR enabled (otherwise fails #1), the special flag has been introduced in Manager (#2) to map backed up DC with DC under restore, for example, sctool restore ... --dc-mapping dc1=dc1,dc2=dc2 The change introduces this new flag into `create_restore_task` method and makes sure that if Scylla cluster has more than 1 datacenter - the restore task will be triggered with this flag applied. refs: 1. scylladb/scylla-manager#3871 2. scylladb/scylla-manager#4213

mikliapko · 2025-02-12T10:38:00Z

@mikliapko Here is the latest build from this branch https://jenkins.scylladb.com/view/scylla-manager/job/manager-master/job/manager-build/923/ It contains all recent changes.

Retesting here https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/mikita/job/manager-master/job/ubuntu22-sanity-test/69/

VAveryanov8 marked this pull request as ready for review January 17, 2025 08:23

VAveryanov8 requested review from karol-kokoszka and Michal-Leszczynski as code owners January 17, 2025 08:23

Michal-Leszczynski reviewed Jan 17, 2025

View reviewed changes

karol-kokoszka reviewed Jan 31, 2025

View reviewed changes

VAveryanov8 force-pushed the va/dc-mapping branch from 4ed7c9d to d237454 Compare February 4, 2025 09:20

VAveryanov8 requested review from Michal-Leszczynski and karol-kokoszka February 4, 2025 13:57

karol-kokoszka reviewed Feb 6, 2025

View reviewed changes

VAveryanov8 force-pushed the va/dc-mapping branch from f0df68a to 9852f42 Compare February 6, 2025 16:19

VAveryanov8 added 12 commits February 6, 2025 17:30

feat(restore): uses dc mappings for restoring tables

6860c71

This introduces use of dc mappings when restoring tables. Now each dc is downloading only data from corresponding dc(s) accordingly to user provided mapping. Also some dcs can be explicitly ignored. Fixes: #3829

feat(tests): adds third cluster (dc3) to docker setup

4144cf1

This adds another cluster to docker setup, so we can have integration tests for dc-mappings. Fixes: #3829

chore(test): adds dc-mapping integration tests

41db186

Fixes: #3829

fix: adds copyright :D

2ccc6e7

fix(test): fixes cluster setup

07412f8

refactor(dcmapping): replaces !dc with --skip-dc-mapping-validation

cc9b17d

This removes support for !dc syntax and introduces new --skip-dc-mapping-validation flag which can be used when partial restore is needed.

refactor: deleted third, but adds dc3 to the second

c1cc753

This removes third cluster, but adds another data center to the second cluster.

refactor(tests): removes dc3 from second cluster

780bec0

This removed dc3 from second cluster as `--dc-mappings` was simplified and there is no necessity in having cluster with another dc name.

chore(tests): alignes integration tests with current behavior.

ee52238

As this pr does not introduce any breaking changes anymore, integration tests can be simplified as well.

VAveryanov8 added 4 commits February 6, 2025 18:06

chore(docs): generates docs (make docs)

f3c2273

fix: locationInfo.AllHosts should return only uniq hosts

9b659cf

This makes locationInfo.AllHosts to return only uniq hosts. Also some small fixes in tests and in error messages.

fix(docs): fixes formatting in docs

ba22c43

fix: fixes the backward compatibility when dc-mapping is not set

99af84e

This fixes the behavior when dc-mapping is not set.

VAveryanov8 force-pushed the va/dc-mapping branch from 9852f42 to 99af84e Compare February 6, 2025 17:08

fix(test): fixes integration test after rebase

524792c

Michal-Leszczynski reviewed Feb 10, 2025

View reviewed changes

mikliapko mentioned this pull request Feb 10, 2025

Adapt for multi dc restore scylladb/scylla-cluster-tests#10041

Draft

4 tasks

VAveryanov8 added 4 commits February 10, 2025 19:27

refactor: simplifies syntax of dc-mappings

010e891

This simplifies syntax of dc-mapping by leveraging pflag.StringToString functionality. Also extends integration tests with validation that each node should download only tables from corresponding DCs when dc-mapping is provided.

refactor: small changes accordingly to code review comments

54f9982

chore(docs): updates dc-mapping docs

beabd75

fixes: small fixes in integration tests

18d5aaf

VAveryanov8 mentioned this pull request Feb 11, 2025

TestRestoreSchemaVersionedIntegration is flaky #4246

Closed

VAveryanov8 requested a review from Michal-Leszczynski February 11, 2025 09:05

Michal-Leszczynski reviewed Feb 11, 2025

View reviewed changes

pkg/service/restore/worker.go Outdated Show resolved Hide resolved

fix: makes comment more correct

acf1c03

Michal-Leszczynski self-requested a review February 11, 2025 13:35

Michal-Leszczynski approved these changes Feb 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(restore): adds --dc-mapping flag to restore command #4213

feat(restore): adds --dc-mapping flag to restore command #4213

VAveryanov8 commented Jan 16, 2025 •

edited

Loading

Michal-Leszczynski commented Jan 17, 2025 •

edited

Loading

Michal-Leszczynski commented Jan 17, 2025

Michal-Leszczynski commented Jan 17, 2025

karol-kokoszka left a comment

karol-kokoszka Jan 31, 2025

karol-kokoszka Jan 31, 2025

karol-kokoszka commented Feb 4, 2025

VAveryanov8 commented Feb 4, 2025

karol-kokoszka left a comment

karol-kokoszka commented Feb 6, 2025

mikliapko commented Feb 6, 2025

karol-kokoszka commented Feb 6, 2025

mikliapko commented Feb 10, 2025

VAveryanov8 commented Feb 10, 2025 •

edited

Loading

mikliapko commented Feb 10, 2025

Michal-Leszczynski left a comment

karol-kokoszka commented Feb 11, 2025

mikliapko commented Feb 12, 2025

feat(restore): adds --dc-mapping flag to restore command #4213

Are you sure you want to change the base?

feat(restore): adds --dc-mapping flag to restore command #4213

Conversation

VAveryanov8 commented Jan 16, 2025 • edited Loading

Michal-Leszczynski commented Jan 17, 2025 • edited Loading

Michal-Leszczynski commented Jan 17, 2025

Michal-Leszczynski commented Jan 17, 2025

karol-kokoszka left a comment

Choose a reason for hiding this comment

karol-kokoszka Jan 31, 2025

Choose a reason for hiding this comment

karol-kokoszka Jan 31, 2025

Choose a reason for hiding this comment

karol-kokoszka commented Feb 4, 2025

VAveryanov8 commented Feb 4, 2025

karol-kokoszka left a comment

Choose a reason for hiding this comment

karol-kokoszka commented Feb 6, 2025

mikliapko commented Feb 6, 2025

karol-kokoszka commented Feb 6, 2025

mikliapko commented Feb 10, 2025

VAveryanov8 commented Feb 10, 2025 • edited Loading

mikliapko commented Feb 10, 2025

Michal-Leszczynski left a comment

Choose a reason for hiding this comment

karol-kokoszka commented Feb 11, 2025

mikliapko commented Feb 12, 2025

VAveryanov8 commented Jan 16, 2025 •

edited

Loading

Michal-Leszczynski commented Jan 17, 2025 •

edited

Loading

VAveryanov8 commented Feb 10, 2025 •

edited

Loading